Object-Proposal Evaluation Protocol is ‘Gameable’ (Supplement)

نویسندگان

  • Neelima Chavali
  • Harsh Agrawal
  • Aroma Mahendru
  • Dhruv Batra
  • Virginia Tech
چکیده

The main paper demonstrated how the object proposal evaluation protocol is ‘gameable’ and performed some experiments to detect this ‘gameability’. In this supplement, we present additional details and results which support the arguments presented in the main paper. In section 1, we list and briefly describe the different object proposal algorithms which we used for our experiments. Following this, details of instance-level PASCAL Context are discussed in section 2. Then we present the results on nearly-fully annotated dataset, cross dataset evaluation on other evaluation metrics in section 3. We also show the per category performance of various methods on MS COCO and PASCAL Context in section 4. 1. Overview of Object Proposal Algorithms Table 1 provides an overview of some popular object proposal algorithms. The symbol ∗ indicates methods we have evaluated in this paper. Note that a majority of the approaches are learning based. 2. Details of PASCAL Context Annotation As explained in section 5.1 of the main paper, PASCAL Context provides full annotations for PASCAL VOC 2010 dataset in the form of semantic segmentations. A total of 459 classes have labeled in this dataset. We split these into three categories namely Objects/Things, Background/Stuff and Ambiguous as shown in Tables 2, 4 and 3. Most classes (396) were put in the ‘Objects’ category. 20 of these are PASCAL categories. Of the remaining 376, we selected the most frequently occurring 60 categories and manually created instance level annotations for the same. Statistics of New Annotations: We made the following observations on our new annotations: *Equal contribution. †Now at Amgen Inc. • The number of instances we annotated for the extra 60 categories were about the same as the number of instances for annotated for 20 PASCAL categories in the original PASCAL VOC. This shows that about half the annotations were missing and thus a lot of genuine proposal candidates are not being rewarded. • Most non-PASCAL categories occupy a small percentage of the image. This is understandable given that the dataset was curated with these categories. The other categories just happened to be in the pictures. 3. Evaluation of Proposals on Other Metrics In this section, we show the performance of different proposal methods and DMPs on MS COCO dataset on various metrics. Fig. 1a shows performance on Recall-vs-IOU metric at 1000 #proposals on PASCAL 20 categories. Fig. 1b, Fig. 1c show performance on Recall-vs.-#proposals metric at 0.5 and 0.7 IOU respectively. Similarly in Figs. 1d,1e, 1f and Figs. 1g,1h, 1i, we can see the performance of all proposal methods and DMPs on these three metrics where 60 non-PASCAL and all categories respectively are annotated in the MS COCO dataset. These metrics also demonstrate the same trend as shown by the AUC-vs.-#proposals in the main paper. When only PASCAL categories are annotated (Figs. 1a,1b, 1c ), DMPs outperform all proposal methods. However, when other categories are also annotated (Figs. 1g,1h, 1i) or the performance is evaluated specifically on the other categories (Figs. 1d,1e, 1f), DMPs cease to be the top performers. Finally, we also report results on different metrics PASCAL Context (Fig. 2) and NYU-Depth v2 (Fig. 3). They also show similar trends, supporting the claims made in the paper. 4. Measuring Fine-Grained Recall We also looked at a more fine-grained per-category performance of proposal methods and DMPs. Fine grained recall can be used to answer if some proposal methods are optimized for larger or frequent categories i.e. if they perform Method Code Source Approach Learning Involved Metric Datasets objectness∗ Source code from [1] Window scoring Yes supervised, train on 6 PASCAL classes and their own custom dataset of 50 images Recall @ t ≥ 0.5 vs # proposals PASCAL VOC 07 test set, test on unseen 16 PASCAL classes selectiveSearch∗ Source code from [2] Segment based No Recall @ t ≥ 0.5 vs # proposals, MABO, per class ABO PASCAL VOC 2007 test set, PASCAL VOC 2012 train val set rahtu∗ Source code from [3] Window Scoring Yes, two stages. Learning of generic bounding box prior on PASCAL VOC 2007 train set, weights for feature combination learnt on the dataset released with [1] Recall @ t > various IoU thresholds and # proposals, AUC PASCAL VOC 2007 test set randomPrim∗ Source code from [4] Segment based Yes supervised, train on 6 PASCAL categories Recall @ t > various IOU thresholds using 10k and 1k proposals Pascal VOC 2007 test set/2012 trainval set on 14 categories not used in training mcg∗ Source code from [5] Segment based Yes NA, only segments were evaluated NA (tested on segmentation dataset) edgeBoxes∗ Source code from [6] Window scoring No AUC, Recall @ t > various IOU thresholds and # proposals, Recall vs IoU PASCAL VOC 2007 testset bing∗ Source code from [7] Window scoring Yes supervised, on PASCAL VOC 2007 train set, 20 object classes/6 object classes Recall @ t> 0.5 vs # proposals PASCAL VOC 2007 detection complete test set/14 unseen object categories rantalankila Source code from [8] Segment based Yes NA, only segments are evaluated NA (tested on segmentation dataset) Geodesic Source code from [9] Segment based Yes, for seed placement and mask construction on PASCAL VOC 2012 Segmentation training set VUS at 10k and 2k windows, Recall vs IoU threshold, Recall vs proposals PASCAL 2012 detection validation set Rigor Source code from [10] Segment based Yes, pairwise potentials between super pixels learned on BSDS-500 boundary detection dataset NA, only segments were evaluated NA (tested on segmentation dataset) endres Source code from [11] Segment based Yes NA, only segments are evaluated NA (tested on segmentation dataset) Table 1: Properties of existing bounding box approaches. * indicates the methods which have studied in this paper. Object/Thing Classes in PASCAL Context Dataset accordion candleholder drainer funnel lightbulb pillar sheep tire aeroplane cap dray furnace lighter pillow shell toaster airconditioner car drinkdispenser gamecontroller line pipe shoe toilet antenna card drinkingmachine gamemachine lion pitcher shoppingcart tong ashtray cart drop gascylinder lobster plant shovel tool babycarriage case drug gashood lock plate sidecar toothbrush bag casetterecorder drum gasstove machine player sign towel ball cashregister drumkit giftbox mailbox pliers signallight toy balloon cat duck glass mannequin plume sink toycar barrel cd dumbbell glassmarble map poker skateboard train baseballbat cdplayer earphone globe mask pokerchip ski trampoline basket cellphone earrings glove mat pole sled trashbin basketballbackboard cello egg gravestone matchbook pooltable slippers tray bathtub chain electricfan guitar mattress postcard snail tricycle bed chair electriciron gun menu poster snake tripod beer chessboard electricpot hammer meterbox pot snowmobiles trophy bell chicken electricsaw handcart microphone pottedplant sofa truck bench chopstick electronickeyboard handle microwave printer spanner tube bicycle clip engine hanger mirror projector spatula turtle binoculars clippers envelope harddiskdrive missile pumpkin speaker tvmonitor bird clock equipment hat model rabbit spicecontainer tweezers birdcage closet extinguisher headphone money racket spoon typewriter birdfeeder cloth eyeglass heater monkey radiator sprayer umbrella birdnest coffee fan helicopter mop radio squirrel vacuumcleaner blackboard coffeemachine faucet helmet motorbike rake stapler vendingmachine board comb faxmachine holder mouse ramp stick videocamera boat computer ferriswheel hook mousepad rangehood stickynote videogameconsole bone cone fireextinguisher horse musicalinstrument receiver stone videoplayer book container firehydrant horse-drawncarriage napkin recorder stool videotape bottle controller fireplace hot-airballoon net recreationalmachines stove violin bottleopener cooker fish hydrovalve newspaper remotecontrol straw wakeboard bowl copyingmachine fishtank inflatorpump oar robot stretcher wallet box cork fishbowl ipod ornament rock sun wardrobe bracelet corkscrew fishingnet iron oven rocket sunglass washingmachine brick cow fishingpole ironingboard oxygenbottle rockinghorse sunshade watch broom crabstick flag jar pack rope surveillancecamera waterdispenser brush crane flagstaff kart pan rug swan waterpipe bucket crate flashlight kettle paper ruler sweeper waterskateboard bus cross flower key paperbox saddle swimring watermelon cabinet crutch fly keyboard papercutter saw swing whale cabinetdoor cup food kite parachute scale switch wheel cage curtain forceps knife parasol scanner table wheelchair cake cushion fork knifeblock pen scissors tableware window calculator cuttingboard forklift ladder pencontainer scoop tank windowblinds calendar disc fountain laddertruck pencil screen tap wineglass camel disccase fox ladle person screwdriver tape wire camera dishwasher frame laptop photo sculpture tarp cameralens dog fridge lid piano scythe telephone can dolphin frog lifebuoy picture sewer telephonebooth candle door fruit light pig sewingmachine tent Table 2: Object/Thing Classes in PASCAL Context Ambiguous Classes in PASCAL Context Dataset artillery escalator ice speedbump bedclothes exhibitionbooth leaves stair clothestree flame outlet tree coral guardrail rail unknown dais handrail shelves Table 3: Ambiguous Classes in PASCAL Context better or worse with respect to different object attributes like area, kinds of objects, etc. It is also easier to observe the change in performance of a particular method on frequently occurring category vs. rarely occurring category. We performed this experiment on instance level PASCAL Context and MS COCO datasets. We sorted/clustered all categories on the basis of: Background/Stuff Classes in PASCAL Context Dataset atrium floor parterre sky bambooweaving foam patio smoke bridge footbridge pelage snow building goal plastic stage ceiling grandstand platform swimmingpool concrete grass playground track controlbooth ground road wall counter hay runway water court kitchenrange sand wharf dock metal shed wood fence mountain sidewalk wool Table 4: Background/Stuff Classes in PASCAL Context • Average size (fraction of image area) of the category, • Frequency (Number of instances) of the category, 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 IoU overlap threshold @ 1000 re ca ll DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (a) Recall vs IOU at 1000 proposals for 20 PASCAL categories annotated in MS COCO validation dataset 10 10 10 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # candidates re ca ll at Io U th re sh ol d 0. 5 DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (b) Recall vs. number of proposals at 0.5 IOU for 20 PASCAL categories annotated in MS COCO validation dataset 10 10 10 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # candidates re ca ll at Io U th re sh ol d 0. 7 DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (c) Recall vs. number of proposals at 0.7 IOU for 20 PASCAL categories annotated in MS COCO validation dataset 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 IoU overlap threshold @ 1000 re ca ll DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (d) Recall vs IOU at 1000 proposals for 60 non-PASCAL categories annotated in MS COCO validation dataset 10 10 10 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # candidates re ca ll at Io U th re sh ol d 0. 5 DPM RCNN edgeBoxes objectness randomPrim rantalankila h u mcg selectiveSearch bing rigor (e) Recall vs. number of proposals at 0.5 IOU for 60 non-PASCAL categories annotated in MS COCO validation dataset 10 10 10 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # candidates re ca ll at Io U th re sh ol d 0. 7 DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (f) Recall vs. number of proposals at 0.7 IOU for 60 non-PASCAL categories annotated in MS COCO validation dataset 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 IoU overlap threshold @ 1000 re ca ll DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (g) Recall vs IOU at 1000 proposals for all categories annotated in MS COCO validation dataset 10 10 10 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # candidates re ca ll at Io U th re sh ol d 0. 5 DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (h) Recall vs. number of proposals at 0.5 IOU for all categories annotated in MS COCO validation dataset 10 10 10 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # candidates re ca ll at Io U th re sh ol d 0. 7 DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (i) Recall vs. number of proposals at 0.7 IOU for all categories annotated in MS COCO validation dataset Figure 1: Performance of various object proposal methods on different evaluation metrics when evaluated on MS COCO dataset. • Membership in ‘super-categories’ defined in MS COCO dataset (electronics, animals, appliance, etc.). 10 pre-defined clusters of objects of different kind (These clusters are the subset of 11 super-categories defined in MS COCO dataset for classifying individual classes in groups of similar objects.) Now, we present the plots of recall for all 80 (20 PASCAL + 60 non-PASCAL) categories for the modified PASCAL Context dataset and MS COCO. Note that the non-PASCAL 60 categories are different for both the datasets. Trends: Fig. 4 shows the performance of different proposal methods and DMPs along each of these dimensions. In Fig. 4a, we see that recall steadily improves perhaps as expected, bigger objects are typically easier to find than smaller objects. In Fig. 4b, we see that the recall generally increases as the number of instances increase except for one outlier category. This category was found to be ‘pole’ which appears to be quite difficult to recall, since poles are often occluded and have a long elongated shape, it is not surprising that this number is pretty low. Finally, in Fig. 4c we observe that some super-categories (e.g. outdoor objects) are hard to recall while others (e.g. animal, electronics) are relatively easier to recall. It can be seen in Fig. 5, the trends on MS COCO are almost similar to PASCAL Context. 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 IoU overlap threshold @ 1000 re ca ll DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (a) Recall vs IOU at 1000 proposals for 20 PASCAL categories annotated in PASCAL Context dataset 10 10 10 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # candidates re ca ll at Io U th re sh ol d 0. 5 DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (b) Recall vs. number of proposals at 0.5 IOU for 20 PASCAL annotated in PASCAL Context dataset 10 10 10 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # candidates re ca ll at Io U th re sh ol d 0. 7 DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (c) Recall vs. number of proposals at 0.7 IOU for 20 PASCAL categories annotated in PASCAL Context dataset 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 IoU overlap threshold @ 1000 re ca ll DPM RCNN edgeBoxes objectness randomPrim raht mcg selectiveSearch bing (d) Recall vs IOU at 1000 proposals for non-PASCAL categories annotated in PASCAL Context dataset 10 10 10 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # candidates re ca ll at Io U th re sh ol d 0. 5 DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (e) Recall vs. number of proposals at 0.5 IOU for non-PASCAL annotated in PASCAL Context dataset 10 10 10 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # candidates re ca ll at Io U th re sh ol d 0. 7 DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (f) Recall vs. number of proposals at 0.7 IOU for non-PASCAL categories annotated in PASCAL Context dataset 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 IoU overlap threshold @ 1000 re ca ll DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (g) Recall vs IOU at 1000 proposals for all categories annotated in PASCAL Context dataset 10 10 10 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # candidates re ca ll at Io U th re sh ol d 0. 5 DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (h) Recall vs. number of proposals at 0.5 IOU for all categories annotated in PASCAL Context dataset 10 10 10 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # candidates re ca ll at Io U th re sh ol d 0. 7 DPM RCNN edgeBoxes objectness randomPrim rahtu mcg selectiveSearch bing (i) Recall vs. number of proposals at 0.7 IOU for all categories annotated in PASCAL Context dataset Figure 2: Performance of various object proposal methods on different evaluation metrics when evaluated on PASCAL Context dataset

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Object-Proposal Evaluation Protocol is 'Gameable'

Object proposals have quickly become the de-facto preprocessing step in a number of vision pipelines (for object detection, object discovery, and other tasks). Their performance is usually evaluated on partially annotated datasets. In this paper, we argue that the choice of using a partially annotated dataset for evaluation of object proposals is problematic – as we demonstrate via a thought ex...

متن کامل

Development and Usability Evaluation of an Online Tutorial for “How to Write a Proposal” for Medical Sciences Students

Background and Objective: Considering the importance of learning how to write a proposal for students, this study was performed to develop an online tutorial for “How to write a Proposal” for students and to evaluate its usability. Methods: This study is a developmental research and tool design. “Gamified Online Tutorial based on Self-Determination Theory (GOT-STD) Framework" became the basis f...

متن کامل

Systematic reviews in nutrition: standardized methodology.

The objective of this study is to establish a methodological proposal in order to carry out qualitative systematic reviews and apply these findings to a review of Omega-3 Fatty Acids with respect to health and illness. Based on a methodological proposal, a general protocol was developed to provide a sound basis for the preparation of the reviews in this journal supplement. A systematic techniqu...

متن کامل

An SDL Modelling Approach for Performance Evaluation of ATM Networks

In this paper we describe a methodology to model and simulate networks based on the asynchronous transfer mode (ATM), and show an example of its use. These networks are designed to integrate multiple communication services, providing di erent service classes with di erent characteristics. To support their provision, a large variety of adaptation protocols and tra c management mechanisms are req...

متن کامل

Availability evaluation of Software architecture of object oriented Style using coloured Petri nets

Software architecture is one of the most fundamental products in the process of software development in the areas of behavioral or non- behavioral features like availability or transformability change. There are different ways to evaluate software architecture one of which is the creation of application model. An executable model of software architecture is an official description of architectu...

متن کامل

Ecole Polytechnique de Louvain

13 1 Video intelligent systems.............................................................................1-15 1.1 State of the art of applications ....................................................................1-15 1.1.1 Components and requirements.......................................................................... 1-16 1.1.2 Applications ..............................................

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016